The purpose of this tutorial is to use our edgelist data to analyse neuronal connectivity. An edgelist is a data frame of neuron-to-neuron connections, which describes a directed, weighted graph.
To do this, we need to think about directional connections between
upstream presynaptic neurons (pre) and downstream
postsynaptic ones (post).
We will represent the strength, i.e. “weight”, of these connections in two ways:
count): The number of chemical
synaptic contacts between two neuronsnorm): The count
normalized by the total number of inputs that a target neuron (i.e.,
post) receivesLet’s start by choosing a dataset and a subset. By default, this notebook will look at BANC and its feeding and mechanosensation circuits of the suboesophageal zone (SEZ). It’s basically the lower part of the brain.
This zone is a relatively under-explored part of the connectome, made of several neuropils: GNG, FLA, SAD, PRW and AMMC.
You can instead choose to work with the full dataset, or a different subset.
Currently working with:
The suboesophageal zone contains several distinct neuropils. Let’s visualize their 3D structures to understand their spatial organization.
Note on 3D mesh organization: - Large
anatomical regions (VNC, brain, etc.): obj/
directory - Smaller specific neuropils (GNG, FLA, AMMC,
etc.): obj/neuropils/ subdirectory
# Extract base dataset name
dataset_base <- sub("_[0-9]+$", "", dataset)
# Download and read meshes as hxsurf objects for nat+plotly
read_neuropil <- function(search, data_path, dataset_base, ext="neuropils"){
neuropil_path <- file.path(data_path, dataset_base, "obj", ext)
neuropil_files <- suppressWarnings(system(paste("gsutil ls", neuropil_path), intern = TRUE))
objs <- neuropil_files[grepl(search, basename(neuropil_files))]
if (length(objs) == 0) {
cat(" No files found matching:", search, "\n")
return(NULL)
}
# Read and convert the first matching file (typically only one per search term)
obj_file <- objs[1]
temp_file <- tempfile(fileext = ".obj")
system(paste("gsutil cp", obj_file, temp_file))
mesh <- as.hxsurf(rgl::readOBJ(temp_file))
return(mesh)
}
# Read brain mesh (large region)
brain_mesh <- read_neuropil("brain", data_path, dataset_base, ext = "")
# Read SEZ neuropils (smaller specific regions)
gng_mesh <- read_neuropil("GNG", data_path, dataset_base)
fla_l_mesh <- read_neuropil("FLA_L", data_path, dataset_base)
fla_r_mesh <- read_neuropil("FLA_R", data_path, dataset_base)
sad_mesh <- read_neuropil("SAD", data_path, dataset_base)
prw_mesh <- read_neuropil("PRW", data_path, dataset_base)
ammc_l_mesh <- read_neuropil("AMMC_L", data_path, dataset_base)
ammc_r_mesh <- read_neuropil("AMMC_R", data_path, dataset_base)
# Plot all meshes in 3D with nat+plotly
nclear3d()
# Plot brain as context (very transparent)
dummy<-plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
# Plot SEZ neuropils with distinct colors
dummy1<-plot3d(gng_mesh, col = "red", alpha = 0.5, add = TRUE)
dummy2<-plot3d(fla_l_mesh, col = "blue", alpha = 0.5, add = TRUE)
dummy3<-plot3d(fla_r_mesh, col = "lightblue", alpha = 0.5, add = TRUE)
dummy4<-plot3d(sad_mesh, col = "green", alpha = 0.5, add = TRUE)
dummy5<-plot3d(prw_mesh, col = "purple", alpha = 0.5, add = TRUE)
dummy6<-plot3d(ammc_l_mesh, col = "orange", alpha = 0.5, add = TRUE)
plot3d(ammc_r_mesh, col = "lightsalmon", alpha = 0.5, add = TRUE)
We will choose our dataset and optionally a pre-prepared subset. By defauly, let’s look at neurons of the SEZ.
# Extract base dataset name (e.g., "banc" from "banc_746")
dataset_base <- sub("_[0-9]+$", "", dataset)
# Construct paths
if (!is.null(subset_name)) {
# Use subset data
subset_dir <- file.path(data_path, dataset_base, subset_name)
edgelist_path <- file.path(subset_dir,
paste0(dataset, "_", subset_name, "_simple_edgelist.feather"))
cat("Using subset:", subset_name, "\n")
} else {
# Use full dataset
edgelist_path <- construct_path(data_path, dataset, "edgelist_simple")
cat("Using full dataset\n")
}
## Using subset: suboesophageal_zone
# Always need full meta data
meta_path <- construct_path(data_path, dataset, "meta")
Now read in the chosen edgelist:
# Read edgelist
edgelist <- read_feather_smart(edgelist_path, gcs_filesystem = gcs_fs)
## Reading from GCS: gs://sjcabs_2025_data/banc/suboesophageal_zone/banc_746_suboesophageal_zone_simple_edgelist.feather
## Downloading from GCS... (this may take several minutes for large files)
## Downloaded 27.11 MB from GCS
## Loading into memory with Arrow...
## ✓ Done! Loaded 857481 rows
# Display first few rows
head(edgelist)
## # A tibble: 6 × 5
## pre post count norm total_input
## <chr> <chr> <int> <dbl> <int>
## 1 720575941455094835 720575941505537986 1 0.000221 4520
## 2 720575941669120423 720575941633713339 4 0.00246 1625
## 3 720575941603036601 720575941535142488 72 0.0124 5799
## 4 720575941555376356 720575941554573313 1 0.00261 383
## 5 720575941391131286 720575941559112807 6 0.0366 164
## 6 720575941436164384 720575941461382413 1 0.00331 302
And get our meta data, subsetted by neurons present in the edgelist
(pre + post):
# Read full meta data
meta_full <- read_feather_smart(meta_path, gcs_filesystem = gcs_fs)
## Reading from GCS: gs://sjcabs_2025_data/banc/banc_746_meta.feather
## Downloading from GCS... (this may take several minutes for large files)
## Downloaded 9.84 MB from GCS
## Loading into memory with Arrow...
## ✓ Done! Loaded 168759 rows
# Get unique neuron IDs from edgelist
neuron_ids <- unique(c(edgelist$pre, edgelist$post))
# Subset meta data to neurons in edgelist
meta <- meta_full %>%
filter(!!sym(dataset_id) %in% neuron_ids)
# Display first few rows
head(meta)
## # A tibble: 6 × 18
## banc_746_id supervoxel_id region side hemilineage nerve flow super_class
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 72057594149663… 762110688716… centr… right putative_p… <NA> intr… visual_cen…
## 2 72057594146777… 768437684896… centr… left putative_p… <NA> intr… central_br…
## 3 72057594144513… 762810254327… centr… left putative_p… <NA> intr… central_br…
## 4 72057594157500… 769141373679… centr… left putative_p… <NA> intr… central_br…
## 5 72057594144288… 755769938568… centr… right putative_p… <NA> intr… central_br…
## 6 72057594165886… 764922683662… centr… left putative_p… <NA> intr… central_br…
## # ℹ 10 more variables: cell_class <chr>, cell_sub_class <chr>, cell_type <chr>,
## # neurotransmitter_predicted <chr>, neurotransmitter_score <dbl>,
## # cell_function <chr>, cell_function_detailed <chr>, body_part_sensory <chr>,
## # body_part_effector <chr>, status <chr>
cat("Meta data for", nrow(meta), "neurons\n")
## Meta data for 9605 neurons
cat("Unique pre neurons:", length(unique(edgelist$pre)), "\n")
## Unique pre neurons: 9601
cat("Unique post neurons:", length(unique(edgelist$post)), "\n")
## Unique post neurons: 9603
Let’s get our bearings and have a look at the meta data for our chosen edgelist.
Let’s see what super_class and cell_class
categories we have:
# Count by super_class
super_class_counts <- meta %>%
filter(!is.na(super_class)) %>%
count(super_class) %>%
arrange(desc(n)) %>%
as.data.frame() %>%
mutate(super_class=as.character(super_class))
# Reorder for plotting (set factor levels explicitly for ggplotly compatibility)
super_class_counts$super_class <- factor(
super_class_counts$super_class,
levels = super_class_counts$super_class[order(super_class_counts$n)]
)
# Create subtitle for plotly compatibility
plot_subtitle <- if(!is.null(subset_name)) paste("Subset:", subset_name) else "Full dataset"
# Plot (swap x and y, no coord_flip for ggplotly compatibility)
p_super <- ggplot(super_class_counts, aes(y = super_class, x = n)) +
geom_col(fill = "steelblue", alpha = 0.8) +
#geom_text(aes(label = n), hjust = -0.2, size = 3) +
labs(
title = paste("Neuron Super Classes:", dataset),
subtitle = plot_subtitle,
y = "Super Class",
x = "Number of Neurons"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
axis.text = element_text(size = 10)
)
save_plot(p_super, paste0(dataset, "_super_class"))
ggplotly(p_super)
If we have sensory (afferent) or effector (efferent) neurons, let’s
see what cell_class we have for each:
# Count by flow and cell_class
flow_subclass <- meta %>%
filter(flow %in% c("afferent", "efferent"),
!is.na(cell_class)) %>%
count(flow, cell_class) %>%
arrange(flow, desc(n)) %>%
group_by(flow) %>%
slice_head(n = 15) %>% # Top 15 per flow
ungroup()
if (nrow(flow_subclass) > 0) {
# Reorder for plotting within each flow group
flow_subclass <- flow_subclass %>%
group_by(flow) %>%
arrange(n) %>%
mutate(cell_class = factor(cell_class, levels = unique(cell_class))) %>%
ungroup()
p_flow <- ggplot(flow_subclass,
aes(y = cell_class, x = n, fill = flow)) +
geom_col(alpha = 0.8) +
geom_text(aes(label = n), hjust = -0.2, size = 3) +
facet_wrap(~flow, scales = "free_y", ncol = 1) +
scale_fill_manual(values = c("afferent" = "#E69F00", "efferent" = "#56B4E9")) +
labs(
title = "Sensory (Afferent) and Effector (Efferent) Neurons",
subtitle = "Top 15 cell sub-classes per flow type",
x = "Cell Sub-Class",
y = "Number of Neurons"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "none",
strip.text = element_text(face = "bold", size = 11)
)
save_plot(p_flow, paste0(dataset, "_flow_subclass"), height = 10)
ggplotly(p_flow)
}
To understand whether connections are excitatory or inhibitory, we
can use predicted neurotransmitter information. The meta data contains
neurotransmitter_predicted and
neurotransmitter_score for each neuron.
We’ll first compute a consensus neurotransmitter for each cell type by taking a weighted mean based on prediction scores. Then we’ll assign signs to connections: - Excitatory (sign: +1): acetylcholine, dopamine - Inhibitory (sign: -1): glutamate, GABA, histamine, serotonin, octopamine
This allows us to create signed connectivity weights
(signed_norm) that capture both connection strength and
likely sign.
# Compute consensus neurotransmitter for each cell_type
celltype_nt <- meta %>%
filter(!is.na(cell_type), !is.na(neurotransmitter_predicted)) %>%
group_by(cell_type, neurotransmitter_predicted) %>%
summarise(
mean_score = mean(neurotransmitter_score, na.rm = TRUE),
n_neurons = n(),
.groups = "drop"
) %>%
group_by(cell_type) %>%
slice_max(mean_score, n = 1, with_ties = FALSE) %>%
select(cell_type, consensus_nt = neurotransmitter_predicted, nt_score = mean_score) %>%
ungroup()
# Assign signs based on neurotransmitter
assign_sign <- function(nt) {
case_when(
tolower(nt) %in% c("acetylcholine", "dopamine") ~ 1,
tolower(nt) %in% c("glutamate", "gaba", "histamine", "serotonin", "octopamine") ~ -1,
TRUE ~ NA_real_
)
}
celltype_nt <- celltype_nt %>%
mutate(sign = assign_sign(consensus_nt))
cat("Cell types with neurotransmitter predictions:", nrow(celltype_nt), "\n")
## Cell types with neurotransmitter predictions: 2738
cat("Excitatory cell types:", sum(celltype_nt$sign == 1, na.rm = TRUE), "\n")
## Excitatory cell types: 1530
cat("Inhibitory cell types:", sum(celltype_nt$sign == -1, na.rm = TRUE), "\n")
## Inhibitory cell types: 1208
Now we’ll add the signed_norm column to our edgelist. For each connection, we multiply the normalized weight by the sign of the presynaptic neuron’s neurotransmitter:
# Add neurotransmitter info to edgelist via meta
edgelist <- edgelist %>%
left_join(
meta %>% select(id = !!sym(dataset_id),
pre_cell_type = cell_type),
by = c("pre" = "id")
) %>%
left_join(
celltype_nt %>% select(cell_type, pre_sign = sign),
by = c("pre_cell_type" = "cell_type")
) %>%
mutate(
# Use cell_type sign, default to excitatory if unknown
pre_sign = coalesce(pre_sign, 1),
signed_norm = norm * pre_sign
) %>%
select(-pre_cell_type, -pre_sign)
cat("Edgelist now includes signed_norm column\n")
## Edgelist now includes signed_norm column
cat("Positive connections (excitatory):", sum(edgelist$signed_norm > 0, na.rm = TRUE), "\n")
## Positive connections (excitatory): 462454
cat("Negative connections (inhibitory):", sum(edgelist$signed_norm < 0, na.rm = TRUE), "\n")
## Negative connections (inhibitory): 395027
Here are some normalised density plots, comparing the distribution of inhibitory and excitatory neuron-neuron connections by synaptic strengths.
# Classify connections by sign
edgelist_signed <- edgelist %>%
mutate(
connection_type = case_when(
signed_norm > 0 ~ "Excitatory",
signed_norm < 0 ~ "Inhibitory",
TRUE ~ "Unknown"
)
) %>%
filter(connection_type != "Unknown")
# Create density plot of absolute weights by connection type
p_signed_dist <- ggplot(edgelist_signed,
aes(x = abs(signed_norm),
color = connection_type,
fill = connection_type)) +
geom_density(alpha = 0.3, linewidth = 1) +
scale_x_log10() +
scale_color_manual(
values = c("Excitatory" = "red", "Inhibitory" = "blue"),
name = "Connection Type"
) +
scale_fill_manual(
values = c("Excitatory" = "red", "Inhibitory" = "blue"),
name = "Connection Type"
) +
labs(
title = "Distribution of Connection Strengths by Sign",
subtitle = "Based on predicted neurotransmitter type",
x = "Normalized Weight (absolute, log scale)",
y = "Density"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "right"
)
save_plot(p_signed_dist, paste0(dataset, "_signed_weight_distribution"))
ggplotly(p_signed_dist)
# Print summary statistics
edgelist_signed %>%
group_by(connection_type) %>%
summarise(
mean_weight = mean(abs(signed_norm), na.rm = TRUE),
median_weight = median(abs(signed_norm), na.rm = TRUE),
.groups = "drop"
) %>%
print()
## # A tibble: 2 × 3
## connection_type mean_weight median_weight
## <chr> <dbl> <dbl>
## 1 Excitatory 0.00506 0.00202
## 2 Inhibitory 0.00756 0.00240
cat("Excitatory connections:", sum(edgelist_signed$connection_type == "Excitatory"), "\n")
## Excitatory connections: 462454
cat("Inhibitory connections:", sum(edgelist_signed$connection_type == "Inhibitory"), "\n")
## Inhibitory connections: 395027
However, it is very important not to overplay this idea of being able to assign signs to connections. In the fly, glutamate can be excitatory or inhibitory. In addition, inhibition doesn’t only quell activity, it can cause it, e.g. through disinhibition in networks. Consider the visual system, all inputting sensory neurons, the photoreceptor neurons, are histaminergic and inhibit their targets.
It is generally better to work with unsigned edgelists and methods. The exception is when examining direct connectivity, or building concise circuit diagrams.
Next, we can get a basic sense of the graph. We can do this using the
library igraph.
We can see a scatter plot of both count and
norm, and observe their correlation:
# Sample if too many points
if (nrow(edgelist) > 50000) {
edgelist_sample <- edgelist %>% sample_n(50000)
} else {
edgelist_sample <- edgelist
}
# Create scatter plot with marginal histograms
p_scatter <- ggplot(edgelist_sample, aes(x = count, y = norm)) +
geom_point(alpha = 0.3, color = "steelblue", size = 1) +
geom_smooth(method = "lm", color = "red", se = TRUE, alpha = 0.2) +
scale_x_log10() +
scale_y_log10() +
labs(
title = "Relationship between Synapse Count and Normalized Weight",
x = "Synapse Count (log scale)",
y = "Normalized Weight (log scale)",
subtitle = sprintf("Correlation: %.3f (Spearman)",
cor(edgelist$count, edgelist$norm, method = "spearman"))
) +
theme_minimal() +
theme(plot.title = element_text(face = "bold"))
save_plot(p_scatter, paste0(dataset, "_weight_correlation"))
p_scatter
We can see density plots of input and output degrees with different synapse count thresholds:
# Calculate in-degree and out-degree with two thresholds
# Threshold 1: count > 1
degree_threshold1 <- bind_rows(
edgelist %>% filter(count > 1) %>% count(post, name = "degree") %>%
mutate(type = "In-degree", threshold = ">1 synapse"),
edgelist %>% filter(count > 1) %>% count(pre, name = "degree") %>%
mutate(type = "Out-degree", threshold = ">1 synapse")
)
# Threshold 2: count > 10
degree_threshold10 <- bind_rows(
edgelist %>% filter(count > 10) %>% count(post, name = "degree") %>%
mutate(type = "In-degree", threshold = ">10 synapses"),
edgelist %>% filter(count > 10) %>% count(pre, name = "degree") %>%
mutate(type = "Out-degree", threshold = ">10 synapses")
)
# Combine
degree_data <- bind_rows(degree_threshold1, degree_threshold10) %>%
mutate(
threshold = factor(threshold, levels = c(">1 synapse", ">10 synapses"))
)
# Plot normalized density
p_degree <- ggplot(degree_data, aes(x = degree, color = type, linetype = threshold)) +
geom_density(alpha = 0.3, linewidth = 1) +
scale_color_manual(
values = c("In-degree" = "#E69F00", "Out-degree" = "#56B4E9"),
name = "Direction"
) +
scale_linetype_manual(
values = c(">1 synapse" = "solid", ">10 synapses" = "dashed"),
name = "Threshold"
) +
scale_x_log10() +
labs(
title = "Degree Distribution by Synapse Count Threshold",
subtitle = "Normalized density plots",
x = "Degree (log scale)",
y = "Density"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold"),
legend.position = "right"
)
save_plot(p_degree, paste0(dataset, "_degree_distribution"))
ggplotly(p_degree)
We can use a network graph plot to look at connectivity when we only have a few nodes.
Since we have many neurons, we will first collapse our edgelist by
super_class by joining with meta. We will
remove cases where we have a super_class of
NA:
# Collapse by super_class and remove self-connections
edgelist_super <- edgelist %>%
left_join(meta %>% select(id = !!sym(dataset_id),
pre_super_class = super_class),
by = c("pre" = "id")) %>%
left_join(meta %>% select(id = !!sym(dataset_id),
post_super_class = super_class),
by = c("post" = "id")) %>%
filter(!is.na(pre_super_class), !is.na(post_super_class),
pre_super_class != post_super_class) %>% # Remove self-connections
group_by(pre_super_class, post_super_class) %>%
summarise(
synapse_count = sum(count, na.rm = TRUE),
weight = sum(count, na.rm = TRUE),
n_connections = n(),
.groups = "drop"
) %>%
filter(synapse_count >= 50) # Keep substantial connections (by synapse count)
# Create vertices
vertices_super <- data.frame(
name = unique(c(edgelist_super$pre_super_class,
edgelist_super$post_super_class))
) %>%
left_join(meta %>%
count(super_class) %>%
rename(name = super_class, size = n),
by = "name")
# Create graph
g_super <- graph_from_data_frame(
d = edgelist_super %>% select(pre_super_class, post_super_class,
weight, synapse_count),
vertices = vertices_super,
directed = TRUE
)
# Convert to tidygraph and add node attributes
g_super_tidy <- as_tbl_graph(g_super) %>%
activate(nodes) %>%
mutate(
degree = centrality_degree(mode = "total"),
super_class = name # Add super_class column for coloring
)
# Create layout with increased repulsion
layout_super <- create_layout(g_super_tidy, layout = "fr", niter = 1000)
# Create subtitle for plotly compatibility
network_subtitle <- paste(dataset, "-", if(!is.null(subset_name)) subset_name else "Full dataset")
# Plot with colors by super_class
p_network <- ggraph(layout_super) +
geom_edge_arc(
aes(width = weight, alpha = weight, color = node1.super_class),
arrow = arrow(length = unit(3, 'mm'), type = "closed"),
start_cap = circle(5, 'mm'),
end_cap = circle(5, 'mm'),
strength = 0.3
) +
geom_node_point(aes(size = degree, color = super_class), alpha = 0.8) +
geom_node_text(aes(label = name), repel = TRUE, size = 3, fontface = "bold") +
scale_edge_width(range = c(0.2, 2), name = "Normalized\nWeight") +
scale_edge_alpha(range = c(0.5, 1.0), name = "Normalized\nWeight") +
scale_size_continuous(range = c(3, 10), name = "Degree") +
scale_color_discrete(name = "Super Class") +
scale_edge_color_discrete(name = "Source\nSuper Class") +
labs(
title = "Connectivity Network by Super Class",
subtitle = network_subtitle
) +
theme_graph() +
theme(
legend.position = "right",
plot.title = element_text(face = "bold", size = 14)
)
save_plot(p_network, paste0(dataset, "_network_super_class"), width = 12, height = 10)
p_network
One good way to look at connectivity directly is to visualize a connectivity matrix.
We will put pre on the rows and post on the
columns.
Since we have many neurons, we will first collapse our edgelist by
cell_type:
# Collapse by cell_type
edgelist_celltype_raw <- edgelist %>%
left_join(meta %>% select(id = !!sym(dataset_id),
pre_cell_type = cell_type,
pre_flow = flow),
by = c("pre" = "id")) %>%
left_join(meta %>% select(id = !!sym(dataset_id),
post_cell_type = cell_type,
post_flow = flow),
by = c("post" = "id")) %>%
filter(!is.na(pre_cell_type), !is.na(post_cell_type))
# Calculate total inputs per post_cell_type for normalization
post_totals <- edgelist_celltype_raw %>%
group_by(post_cell_type) %>%
summarise(post_total_count = sum(count, na.rm = TRUE), .groups = "drop")
# Aggregate by cell type and recalculate norm
edgelist_celltype <- edgelist_celltype_raw %>%
group_by(pre_cell_type, post_cell_type, pre_flow, post_flow) %>%
summarise(
total_count = sum(count, na.rm = TRUE),
# Get the modal sign for this cell type pair (most common sign)
modal_sign = ifelse(sum(signed_norm > 0, na.rm = TRUE) > sum(signed_norm < 0, na.rm = TRUE), 1, -1),
.groups = "drop"
) %>%
left_join(post_totals, by = "post_cell_type") %>%
mutate(
norm = total_count / post_total_count,
# Use modal sign to create signed_norm (preserves magnitude)
signed_norm = norm * modal_sign
) %>%
select(-post_total_count, -modal_sign) %>%
filter(total_count>=10)
head(edgelist_celltype)
## # A tibble: 6 × 7
## pre_cell_type post_cell_type pre_flow post_flow total_count norm
## <chr> <chr> <chr> <chr> <int> <dbl>
## 1 5-HTPMPV03 SAD077 intrinsic intrinsic 12 0.00214
## 2 AL-AST1 ALIN4 intrinsic intrinsic 18 0.0138
## 3 AL-AST1 ALIN7 intrinsic intrinsic 18 0.0352
## 4 AL-AST1 ALON3 intrinsic intrinsic 10 0.0344
## 5 AL-AST1 AN01A055 intrinsic intrinsic 26 0.00767
## 6 AL-AST1 AN_LH_AVLP_1 intrinsic intrinsic 10 0.02
## # ℹ 1 more variable: signed_norm <dbl>
Next, we can choose the strongest cell type-to-cell type connections to visualize, i.e., those above the 95th percentile:
# Calculate threshold
threshold_95 <- quantile(edgelist_celltype$total_count, 0.95)
# Filter for strong connections
edgelist_strong <- edgelist_celltype %>%
filter(total_count >= threshold_95)
cat("Connections above 95th percentile (>", round(threshold_95), "synapses):",
nrow(edgelist_strong), "\n")
## Connections above 95th percentile (> 140 synapses): 3142
And then we can plot our connectivity matrix. We are looking at very many neurons, so let’s also make things interesting by just looking at cell type upstream of the motor neurons that control the probosics.
# Propobscis motor cell types
proboscis_motor_cts <- meta %>%
filter(grepl("proboscis",body_part_effector),
!is.na(cell_type)) %>%
pull(cell_type)
# Prepare data for heatmap (aggregate first in case of duplicates)
conn_heatmap_data <- edgelist_celltype %>%
filter(pre_cell_type %in% proboscis_motor_cts | post_cell_type %in% proboscis_motor_cts) %>%
group_by(pre_cell_type, post_cell_type) %>%
summarise(
signed_norm = mean(signed_norm, na.rm = TRUE),
.groups = "drop"
)
# Create matrix for clustering (use absolute values for clustering)
conn_matrix <- conn_heatmap_data %>%
tidyr::pivot_wider(
names_from = post_cell_type,
values_from = signed_norm,
values_fill = 0
) %>%
tibble::column_to_rownames("pre_cell_type") %>%
as.matrix()
conn_matrix[is.na(conn_matrix)] <- 0
conn_matrix[is.infinite(conn_matrix)] <- 0
# Cap color scale at 25th and 75th percentiles for better visibility
p10 <- quantile(conn_matrix, 0.25, na.rm = TRUE)
p90 <- quantile(conn_matrix, 0.75, na.rm = TRUE)
max_abs_val <- max(abs(c(p10, p90)))
# Ensure max_abs_val is not zero (which would cause non-unique breaks)
if (max_abs_val == 0 || is.na(max_abs_val)) {
# Fall back to using the actual range of the data
max_abs_val <- max(abs(conn_matrix), na.rm = TRUE)
if (max_abs_val == 0 || is.na(max_abs_val)) {
max_abs_val <- 1 # Default fallback
}
}
# Create static heatmap with pheatmap
pheatmap(
conn_matrix,
clustering_method = "ward.D2",
clustering_distance_rows = "euclidean",
clustering_distance_cols = "euclidean",
color = colorRampPalette(c("blue", "white", "red"))(256),
breaks = seq(-max_abs_val, max_abs_val, length.out = 257),
show_rownames = TRUE,
show_colnames = TRUE,
main = "Signed Connectivity Matrix: Strong Connections",
filename = file.path(img_dir, paste0(dataset, "_conn_matrix_strong.png")),
width = 10,
height = 16
)
# Create interactive heatmap with Ward's clustering (no dendrograms shown)
p_matrix <- heatmaply(
conn_matrix,
dendrogram = "none",
hclust_method = "ward.D2",
dist_method = function(x) dist(abs(x)),
colors = colorRampPalette(c("blue", "cyan", "white", "yellow", "red"))(256),
limits = c(-max_abs_val, max_abs_val),
main = "Signed Connectivity Matrix: Strong Connections (>95th percentile)",
xlab = "Postsynaptic Cell Type",
ylab = "Presynaptic Cell Type",
showticklabels = c(FALSE, FALSE),
hide_colorbar = FALSE,
fontsize_row = 6,
fontsize_col = 6,
key.title = "Signed\nWeight"
)
p_matrix
In general, one thing I find useful is to look at sensory neurons
(flow == "afferent") and effector (i.e., motor or
endocrine) neurons (flow == "efferent"), because they are
quite interpretable.
For these neurons, cell_class and/or
cell_sub_class is often the most useful label.
For sensory and motor neurons, this label can tell us about innervation of exterior body parts.
As well as internal ones.
Let’s re-collapse our edgelist, but by cell_class for
sensory/effector neurons, and cell_type for everything
else.
Let’s take every cell type with at least 100 connections from a sensory neuron, and visualize this as a heatmap:
# Collapse with mixed labels
edgelist_mixed <- edgelist %>%
left_join(meta %>% select(id = !!sym(dataset_id),
pre_cell_type = cell_type,
pre_cell_class = cell_class,
pre_flow = flow),
by = c("pre" = "id")) %>%
left_join(meta %>% select(id = !!sym(dataset_id),
post_cell_type = cell_type,
post_cell_class = cell_class,
post_flow = flow),
by = c("post" = "id")) %>%
mutate(
pre_label = ifelse(pre_flow %in% c("afferent", "efferent") & !is.na(pre_cell_class),
pre_cell_class, pre_cell_type),
post_label = ifelse(post_flow %in% c("afferent", "efferent") & !is.na(post_cell_class),
post_cell_class, post_cell_type)
) %>%
filter(!is.na(pre_label), !is.na(post_label))
# Calculate total inputs per post_label for sensory outputs
sensory_post_totals <- edgelist_mixed %>%
filter(pre_flow == "afferent") %>%
group_by(post_label) %>%
summarise(post_total_count = sum(count, na.rm = TRUE), .groups = "drop")
# Sensory outputs (at least 100 synapses)
sensory_outputs <- edgelist_mixed %>%
filter(pre_flow == "afferent") %>%
group_by(pre_label, post_label) %>%
summarise(
total_count = sum(count, na.rm = TRUE),
.groups = "drop"
) %>%
left_join(sensory_post_totals, by = "post_label") %>%
mutate(norm = total_count / post_total_count) %>%
select(-post_total_count)
# Create matrix for heatmap
sensory_matrix <- sensory_outputs %>%
group_by(pre_label, post_label) %>%
summarise(norm = mean(norm, na.rm = TRUE), .groups = "drop") %>%
tidyr::pivot_wider(
names_from = post_label,
values_from = norm,
values_fill = 0
) %>%
tibble::column_to_rownames("pre_label") %>%
as.matrix()
sensory_matrix[is.na(sensory_matrix)] <- 0
sensory_matrix[is.infinite(sensory_matrix)] <- 0
sensory_matrix <- sensory_matrix[rowSums(sensory_matrix)>10,]
# Create static heatmap with pheatmap
pheatmap(
sensory_matrix,
clustering_method = "ward.D2",
show_rownames = TRUE,
show_colnames = FALSE,
main = "Sensory Neuron Outputs",
filename = file.path(img_dir, paste0(dataset, "_sensory_outputs.png")),
width = 10,
height = 10
)
# Create interactive heatmap with Ward's clustering (no dendrograms shown)
p_sensory <- heatmaply(
sensory_matrix,
dendrogram = "none",
hclust_method = "ward.D2",
main = "Sensory Neuron Targetting",
xlab = "Target Neuron Type",
ylab = "Sensory Neuron Type",
showticklabels = c(FALSE, TRUE),
hide_colorbar = FALSE,
fontsize_row = 6,
fontsize_col = 6,
key.title = "Normalized\nWeight"
)
p_sensory
Interpretable!
If your sample does not have sensory or effector neurons, can you think of well-characterized cell types it does contain, to help you further subset and think about your data?
Now try this analysis yourself with a different dataset!
Exercise: Switch the pre-prepared subset to
front_leg
# To work with a different dataset, change the dataset variable at the top:
# subset_name <- "front_leg"
# neuropil_pattern <- "T1"
# Then re-run the entire notebook to see how the results differ!
Below are more involved analyses, with longer compute times. Working through these will show you how to cluster neurons by their connection similarity, and investigate the groups.
There are many different ways to cluster nodes by their connectivity.
For example, modularity algorithms like the Louvain
algorithm, which is implemented in igraph.
Here, we can take a simple but effective method. First, we will calculate the cosine similarity between all neurons in our edgelist, based on both their input and output connectivity:
# Create connectivity matrix (neurons x partners)
# Combine both pre and post connections for each neuron
# Get all neurons
all_neurons <- union(edgelist$pre, edgelist$post)
# Filter for neurons with sufficient connections
conn_counts <- data.frame(
id = c(edgelist$pre, edgelist$post)
) %>%
count(id) %>%
filter(n >= 10) # At least 10 connections
neurons_to_use <- intersect(all_neurons, conn_counts$id)
# Create sparse matrix for both inputs and outputs
# Rows = neurons, Cols = all partners (pre or post)
edgelist_filtered <- edgelist %>%
filter(pre %in% neurons_to_use | post %in% neurons_to_use) %>%
mutate(norm = ifelse(is.na(norm) | norm == 0, 0.001, norm)) # Avoid zeros
# Prepare for matrix creation
conn_data <- bind_rows(
edgelist_filtered %>%
filter(pre %in% neurons_to_use) %>%
select(neuron = pre, partner = post, weight = norm) %>%
mutate(type = "output"),
edgelist_filtered %>%
filter(post %in% neurons_to_use) %>%
select(neuron = post, partner = pre, weight = norm) %>%
mutate(type = "input")
) %>%
mutate(partner_type = paste(type, partner, sep = "_")) # Distinguish input vs output
# Create sparse matrix
neuron_factor <- factor(conn_data$neuron, levels = neurons_to_use)
partner_factor <- factor(conn_data$partner_type)
inout_connection_matrix <- sparseMatrix(
i = as.integer(neuron_factor),
j = as.integer(partner_factor),
x = conn_data$weight,
dims = c(length(neurons_to_use), nlevels(partner_factor)),
dimnames = list(neurons_to_use, levels(partner_factor))
)
# Remove all-zero rows
non_zero_rows <- which(rowSums(abs(inout_connection_matrix)) > 0.00001)
inout_connection_matrix <- inout_connection_matrix[non_zero_rows, ]
# Remove all-zero columns
non_zero_cols <- which(colSums(abs(inout_connection_matrix)) > 0.00001)
inout_connection_matrix <- inout_connection_matrix[, non_zero_cols]
# Calculate sparsity
sparsity <- sum(inout_connection_matrix == 0) / prod(dim(inout_connection_matrix))
# Calculate cosine similarity
sparse_matrix <- as(as.matrix(t(inout_connection_matrix)), "dgCMatrix")
undirected_cosine_sim_matrix <- cosine_similarity_sparse(sparse_matrix)
undirected_cosine_sim_matrix[is.infinite(undirected_cosine_sim_matrix)] <- 0
cat("Matrix sparsity:", round(sparsity * 100, 2), "%\n")
## Matrix sparsity: 99.07 %
cat("Dimensions:", nrow(undirected_cosine_sim_matrix), "x",
ncol(undirected_cosine_sim_matrix), "\n")
## Dimensions: 9601 x 9601
We can use these similarity scores to build a UMAP representation:
# Represent as UMAP
set.seed(42)
umap_result <- uwot::umap(
undirected_cosine_sim_matrix,
metric = "cosine",
n_epochs = 500,
n_neighbors = min(30, nrow(undirected_cosine_sim_matrix) - 1),
min_dist = 0.1,
n_trees = 50,
n_components = 2,
verbose = FALSE
)
# Create a data frame with UMAP coordinates
umap_df <- data.frame(
UMAP1 = umap_result[, 1],
UMAP2 = umap_result[, 2],
id = rownames(undirected_cosine_sim_matrix)
) %>%
left_join(
meta %>% select(
id = !!sym(dataset_id),
cell_type, super_class, cell_class, cell_sub_class,
flow, region, hemilineage
),
by = "id"
)
# Plot UMAP colored by super_class with hover text showing ID and cell_type
p_umap_super <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2, color = super_class,
text = paste0("ID: ", id, "\nCell Type: ", cell_type))) +
geom_point(alpha = 0.6, size = 1.5) +
labs(
title = "UMAP of Connectivity Patterns",
subtitle = "Colored by super_class",
color = "Super Class"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "right"
)
save_plot(p_umap_super, paste0(dataset, "_umap_super_class"))
ggplotly(p_umap_super, tooltip = "text")
Next, we will perform hierarchical clustering on the UMAP coordinates to identify connectivity-based clusters:
# Perform hierarchical clustering
dist_matrix <- dist(umap_result, method = "euclidean")
hc <- hclust(dist_matrix, method = "ward.D2")
# Dynamic tree cutting for automatic cluster detection
if (require(dynamicTreeCut, quietly = TRUE)) {
dynamic_clusters <- cutreeDynamic(
hc,
distM = as.matrix(dist_matrix),
deepSplit = 2,
minClusterSize = max(5, round(nrow(umap_df) * 0.01))
)
} else {
# Fallback: cut tree at fixed height
dynamic_clusters <- cutree(hc, k = min(12, ceiling(nrow(umap_df) / 10)))
}
## ..cutHeight not given, setting it to 606 ===> 99% of the (truncated) height range in dendro.
## ..done.
umap_df$unordered_cluster <- factor(dynamic_clusters)
# Calculate centroids of clusters
centroids <- umap_df %>%
group_by(unordered_cluster) %>%
summarize(
UMAP1_centroid = mean(UMAP1),
UMAP2_centroid = mean(UMAP2),
size = n()
)
print(centroids %>% arrange(desc(size)))
## # A tibble: 9 × 4
## unordered_cluster UMAP1_centroid UMAP2_centroid size
## <fct> <dbl> <dbl> <int>
## 1 1 0.0590 -0.758 1586
## 2 2 -3.99 -4.53 1582
## 3 3 4.68 2.85 1169
## 4 4 -6.38 -2.18 1065
## 5 5 -5.37 0.846 1055
## 6 6 8.11 -1.21 1031
## 7 7 0.967 1.81 778
## 8 8 -0.329 5.67 769
## 9 9 7.69 3.43 566
# Calculate pairwise distances between centroids
dist_centroids <- dist(centroids[, c("UMAP1_centroid", "UMAP2_centroid")],
method = "euclidean")
# Order clusters based on hierarchical clustering of centroids
hc_centroids <- hclust(dist_centroids, method = "ward.D2")
dd <- as.dendrogram(hc_centroids)
ordered_cluster <- 1:length(order.dendrogram(dd))
names(ordered_cluster) <- order.dendrogram(dd)
# Map original cluster numbers to new ordered cluster numbers
umap_df$cluster <- ordered_cluster[as.character(umap_df$unordered_cluster)]
umap_df$cluster <- factor(umap_df$cluster, levels = sort(unique(umap_df$cluster)))
# Create color palette
n_clusters <- length(unique(umap_df$cluster))
cluster_colors <- colorRampPalette(c("#E41A1C", "#377EB8", "#4DAF4A",
"#984EA3", "#FF7F00", "#FFFF33"))(n_clusters)
names(cluster_colors) <- sort(unique(umap_df$cluster))
# Calculate cluster centroids for labeling
cluster_centroids <- umap_df %>%
group_by(cluster) %>%
summarise(
UMAP1 = mean(UMAP1),
UMAP2 = mean(UMAP2),
n = n()
)
# Plot UMAP colored by cluster with hover text showing ID and cell_type
p_umap_clusters <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2, color = cluster,
text = paste0("ID: ", id, "\nCell Type: ", cell_type))) +
geom_point(alpha = 0.7, size = 2) +
scale_color_manual(values = cluster_colors) +
geom_text(
data = cluster_centroids,
aes(x = UMAP1, y = UMAP2, label = cluster),
color = "black",
size = 5,
fontface = "bold",
inherit.aes = FALSE
) +
labs(
title = "Connectivity-Based Clusters",
subtitle = sprintf("%d clusters identified by hierarchical clustering", n_clusters)
) +
theme_void() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
legend.position = "none"
)
save_plot(p_umap_clusters, paste0(dataset, "_umap_clusters"))
ggplotly(p_umap_clusters, tooltip = "text")
cat("Found", length(unique(dynamic_clusters)), "clusters\n")
## Found 9 clusters
Let’s examine what cell types are in each cluster:
# Summarize clusters by cell_type and super_class
cluster_summary <- umap_df %>%
filter(!is.na(cluster)) %>%
count(cluster, super_class, cell_type) %>%
arrange(cluster, desc(n))
# Top cell types per cluster
top_types_per_cluster <- cluster_summary %>%
group_by(cluster) %>%
slice_head(n = 3) %>%
summarise(
top_types = paste(cell_type, collapse = ", "),
.groups = "drop"
)
print(top_types_per_cluster)
## # A tibble: 9 × 2
## cluster top_types
## <fct> <chr>
## 1 1 SApp10, SApp09, SApp22
## 2 2 JO-E, JO-B, CB0958
## 3 3 JO-B, JO-E, JO-A
## 4 4 CB1379, ENS3, unknown
## 5 5 NA, claw_tpGRN, TPMN2
## 6 6 LB3b-c, SAch02, claw_tpGRN
## 7 7 BM_InOm, BM_Taste, BM_vOcci_vPoOr
## 8 8 CL121,CL122, AN08B099, NA
## 9 9 DNg12_e, DNg12_a, SAch01
# Super class composition
cluster_super <- umap_df %>%
filter(!is.na(cluster), !is.na(super_class)) %>%
count(cluster, super_class) %>%
group_by(cluster) %>%
mutate(proportion = n / sum(n)) %>%
ungroup()
# Plot super class composition
p_cluster_comp <- ggplot(cluster_super,
aes(x = cluster, y = proportion, fill = super_class)) +
geom_col() +
scale_fill_brewer(palette = "Set3") +
labs(
title = "Cluster Composition by Super Class",
x = "Cluster",
y = "Proportion",
fill = "Super Class"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold"),
axis.text.x = element_text(angle = 0)
)
save_plot(p_cluster_comp, paste0(dataset, "_cluster_composition"))
ggplotly(p_cluster_comp)
Let’s create a special UMAP visualization highlighting sensory and effector neurons:
# Add flow information to UMAP data
umap_df_flow <- umap_df %>%
mutate(
is_sensory = super_class == "sensory",
is_effector = flow == "efferent", # Use flow column for effector neurons
display_type = case_when(
is_sensory ~ "Sensory",
is_effector ~ "Effector",
TRUE ~ "Other"
),
point_shape = case_when(
is_sensory ~ "circle",
is_effector ~ "square",
TRUE ~ "circle"
),
display_label = case_when(
is_sensory ~ cell_sub_class,
is_effector ~ cell_class,
TRUE ~ "Other"
)
)
# Create plot with hover text showing ID and cell_type
p_umap_sensory_effector <- ggplot(umap_df_flow, aes(x = UMAP1, y = UMAP2)) +
# Plot "Other" neurons in grey first
geom_point(
data = umap_df_flow %>% filter(display_type == "Other"),
aes(color = display_type, text = paste0("ID: ", id, "\nCell Type: ", cell_type)),
alpha = 0.3,
size = 1.5,
shape = 16
) +
# Plot sensory neurons (circles) colored by cell_sub_class
geom_point(
data = umap_df_flow %>% filter(is_sensory),
aes(color = display_label, text = paste0("ID: ", id, "\nCell Type: ", cell_type)),
alpha = 0.8,
size = 2.5,
shape = 16 # circle
) +
# Plot effector neurons (squares) colored by cell_class
geom_point(
data = umap_df_flow %>% filter(is_effector),
aes(color = display_label, text = paste0("ID: ", id, "\nCell Type: ", cell_type)),
alpha = 0.8,
size = 2.5,
shape = 15 # square
) +
scale_color_manual(
values = c(
"Other" = "grey70",
setNames(
rainbow(length(unique(c(
umap_df_flow$cell_sub_class[umap_df_flow$is_sensory],
umap_df_flow$cell_class[umap_df_flow$is_effector]
)))),
unique(c(
umap_df_flow$cell_sub_class[umap_df_flow$is_sensory],
umap_df_flow$cell_class[umap_df_flow$is_effector]
))
)
),
name = "Cell Sub-Class"
) +
labs(
title = "UMAP: Sensory and Effector Neurons",
subtitle = "Sensory = circles, Effector = squares, colored by cell sub-class",
x = "UMAP 1",
y = "UMAP 2"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "right"
)
save_plot(p_umap_sensory_effector, paste0(dataset, "_umap_sensory_effector"))
ggplotly(p_umap_sensory_effector, tooltip = "text")
Finally, let’s create a summary network where each node represents either: - A connectivity-based cluster (for internal neurons) - Individual sensory cell classes (grey nodes) - Individual effector cell classes (black nodes)
# Create node assignments: cluster for interneurons, cell_class for sensory/effector
umap_df_annotated <- umap_df %>%
mutate(
node_label = case_when(
flow == "afferent" & !is.na(cell_class) ~ paste0("Sensory: ", cell_class),
flow == "efferent" & !is.na(cell_class) ~ paste0("Effector: ", cell_class),
TRUE ~ paste0("Cluster ", cluster)
),
node_type = case_when(
flow == "afferent" ~ "sensory",
flow == "efferent" ~ "effector",
TRUE ~ "cluster"
)
)
# Create edgelist with node labels (also track cluster for coloring)
edgelist_cluster_network <- edgelist %>%
left_join(
umap_df_annotated %>% select(id, pre_node_label = node_label, pre_node_type = node_type, pre_cluster = cluster),
by = c("pre" = "id")
) %>%
left_join(
umap_df_annotated %>% select(id, post_node_label = node_label, post_node_type = node_type, post_cluster = cluster),
by = c("post" = "id")
) %>%
filter(!is.na(pre_node_label), !is.na(post_node_label),
pre_node_label != post_node_label) %>%
group_by(pre_node_label, post_node_label, pre_node_type, post_node_type, pre_cluster) %>%
summarise(
synapse_count = sum(count, na.rm = TRUE),
weight = sum(count, na.rm = TRUE),
n_connections = n(),
.groups = "drop"
) %>%
filter(synapse_count >= 100) # Keep substantial connections
# Create vertices with node types and cluster info for coloring
vertices_cluster <- umap_df_annotated %>%
group_by(node_label, node_type, cluster) %>%
summarise(size = n(), .groups = "drop") %>%
rename(name = node_label) %>%
filter(name %in% c(edgelist_cluster_network$pre_node_label,
edgelist_cluster_network$post_node_label)) %>%
distinct(name, .keep_all = TRUE)
# Create graph
g_cluster <- graph_from_data_frame(
d = edgelist_cluster_network %>% select(pre_node_label, post_node_label,
weight, synapse_count),
vertices = vertices_cluster,
directed = TRUE
)
# Convert to tidygraph
g_cluster_tidy <- as_tbl_graph(g_cluster) %>%
activate(nodes) %>%
mutate(degree = centrality_degree(mode = "total"))
# Create layout with increased repulsion (larger area for better spacing)
layout_cluster <- create_layout(g_cluster_tidy, layout = "fr", niter = 1000,
area = vcount(g_cluster_tidy)^2.5)
# Create node color mapping: use cluster colors for clusters, fixed colors for sensory/effector
layout_cluster <- layout_cluster %>%
mutate(
node_color = case_when(
node_type == "sensory" ~ "grey60",
node_type == "effector" ~ "black",
node_type == "cluster" & !is.na(cluster) ~ cluster_colors[as.character(cluster)],
TRUE ~ "steelblue"
)
)
# Plot
p_cluster_network <- ggraph(layout_cluster) +
geom_edge_arc(
aes(width = weight, alpha = weight),
arrow = arrow(length = unit(2, 'mm'), type = "closed"),
start_cap = circle(4, 'mm'),
end_cap = circle(4, 'mm'),
strength = 0.3,
color = "grey60"
) +
geom_node_point(
aes(size = size, color = node_color),
alpha = 0.8
) +
geom_node_text(
aes(label = name),
repel = TRUE,
size = 2.5,
fontface = "bold"
) +
scale_edge_width(range = c(0.2, 1.5), name = "Synapse\nCount") +
scale_edge_alpha(range = c(0.5, 1.0)) +
scale_size_continuous(range = c(3, 12), name = "Neuron\nCount") +
scale_color_identity() +
labs(
title = "Cluster Network with Sensory-Effector Nodes",
subtitle = "Nodes = connectivity clusters + sensory/effector cell classes"
) +
theme_graph() +
theme(
legend.position = "right",
plot.title = element_text(face = "bold", size = 14)
)
save_plot(p_cluster_network, paste0(dataset, "_cluster_network"), width = 14, height = 12)
p_cluster_network
cat("Network nodes:", vcount(g_cluster), "\n")
## Network nodes: 31
cat("Network edges:", ecount(g_cluster), "\n")
## Network edges: 279
cat("Sensory nodes:", sum(vertices_cluster$node_type == "sensory", na.rm = TRUE), "\n")
## Sensory nodes: 11
cat("Effector nodes:", sum(vertices_cluster$node_type == "effector", na.rm = TRUE), "\n")
## Effector nodes: 11
cat("Cluster nodes:", sum(vertices_cluster$node_type == "cluster", na.rm = TRUE), "\n")
## Cluster nodes: 9
Remembering what we learned from tutorial 02, we can read
.swc files by cluster, and visualise the morphologies of
each cluster, to give us a sense of what we are dealing with.
# Sample a few neurons per cluster for visualization
set.seed(42)
neurons_per_cluster <- 10
max_clusters_to_viz <- min(8, n_clusters) # Visualize up to 8 clusters
# Get sample of neurons for each cluster
# Note: slice_sample will use all available neurons if n exceeds group size
cluster_samples <- umap_df %>%
filter(cluster %in% 1:max_clusters_to_viz) %>%
group_by(cluster) %>%
slice_sample(n = neurons_per_cluster) %>%
ungroup()
# Download skeletons
swc_path <- file.path(data_path, dataset_base, "obj", "skeletons")
# Create temporary directory for SWC files
temp_swc_dir <- tempdir()
# Read neurons for each cluster
neurons_by_cluster <- list()
for (cl in 1:max_clusters_to_viz) {
cluster_ids <- cluster_samples %>%
filter(cluster == cl) %>%
pull(id)
neurons_list <- list()
for (nid in cluster_ids) {
neuron <- read_swc_neuron(nid, swc_path, temp_swc_dir)
if (!is.null(neuron)) {
neurons_list[[as.character(nid)]] <- neuron
}
}
if (length(neurons_list) > 0) {
neurons_by_cluster[[cl]] <- as.neuronlist(neurons_list)
}
}
Now let’s visualize each cluster’s morphologies:
if (1 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[1]]) > 0) {
nclear3d()
dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
plot3d(neurons_by_cluster[[1]], col = cluster_colors[1], soma = 10000)
}
if (2 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[2]]) > 0) {
nclear3d()
dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
plot3d(neurons_by_cluster[[2]], col = cluster_colors[2], soma = 10000)
}
if (3 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[3]]) > 0) {
nclear3d()
dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
plot3d(neurons_by_cluster[[3]], col = cluster_colors[3], soma = 10000)
}
if (4 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[4]]) > 0) {
nclear3d()
dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
plot3d(neurons_by_cluster[[4]], col = cluster_colors[4], soma = 10000)
}
if (5 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[5]]) > 0) {
nclear3d()
dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
plot3d(neurons_by_cluster[[5]], col = cluster_colors[5], soma = 10000)
}
if (6 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[6]]) > 0) {
nclear3d()
dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
plot3d(neurons_by_cluster[[6]], col = cluster_colors[6], soma = 10000)
}
if (7 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[7]]) > 0) {
nclear3d()
dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
plot3d(neurons_by_cluster[[7]], col = cluster_colors[7], soma = 10000)
}
if (8 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[8]]) > 0) {
nclear3d()
dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
plot3d(neurons_by_cluster[[8]], col = cluster_colors[8], soma = 10000)
}
In this tutorial, we covered:
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] influencer_0.1.0 remotes_2.5.0 doSNOW_1.0.20
## [4] snow_0.4-4 iterators_1.0.14 foreach_1.5.2
## [7] readobj_0.4.1 dynamicTreeCut_1.63-1 lsa_0.73.3
## [10] SnowballC_0.7.0 tidygraph_1.2.3 ggraph_2.2.1
## [13] igraph_1.5.1 htmlwidgets_1.6.4 uwot_0.1.14
## [16] Matrix_1.6-1.1 umap_0.2.10.0 pheatmap_1.0.12
## [19] heatmaply_1.4.0 viridis_0.6.5 viridisLite_0.4.2
## [22] ggdendro_0.1.23 duckdb_0.9.2-1 DBI_1.2.3
## [25] plotly_4.11.0 nat.flybrains_1.8.2 nat.templatebrains_1.2.1
## [28] nat.nblast_1.6.7 nat_1.11.0 rgl_1.2.8
## [31] patchwork_1.1.3 forcats_0.5.2 stringr_1.6.0
## [34] dplyr_1.1.4 purrr_1.1.0 readr_2.1.5
## [37] tidyr_1.3.1 tibble_3.3.0 ggplot2_4.0.0.9000
## [40] tidyverse_1.3.2 arrow_16.1.0
##
## loaded via a namespace (and not attached):
## [1] readxl_1.4.1 backports_1.5.0 spam_2.10-0
## [4] systemfonts_1.2.3 plyr_1.8.9 lazyeval_0.2.2
## [7] splines_4.2.1 crosstalk_1.2.0 digest_0.6.37
## [10] ca_0.71.1 htmltools_0.5.8.1 magrittr_2.0.4
## [13] memoise_2.0.1 googlesheets4_1.1.1 tzdb_0.4.0
## [16] graphlayouts_1.1.1 modelr_0.1.11 extrafont_0.18
## [19] extrafontdb_1.0 askpass_1.2.1 blob_1.2.4
## [22] rvest_1.0.3 rappdirs_0.3.3 ggrepel_0.9.5
## [25] textshaping_0.3.6 haven_2.5.1 xfun_0.54
## [28] crayon_1.5.3 jsonlite_2.0.0 glue_1.8.0
## [31] polyclip_1.10-4 registry_0.5-1 gtable_0.3.6
## [34] gargle_1.6.0 webshot_0.5.5 Rttf2pt1_1.3.11
## [37] scales_1.4.0 Rcpp_1.0.11 reticulate_1.34.0
## [40] bit_4.6.0 dotCall64_1.1-1 httr_1.4.7
## [43] RColorBrewer_1.1-3 nabor_0.5.0 pkgconfig_2.0.3
## [46] farver_2.1.2 sass_0.4.8 dbplyr_2.2.1
## [49] utf8_1.2.6 reshape2_1.4.4 labeling_0.4.3
## [52] tidyselect_1.2.1 rlang_1.1.6 cellranger_1.1.0
## [55] tools_4.2.1 cachem_1.1.0 cli_3.6.5
## [58] generics_0.1.4 RSQLite_2.3.4 broom_1.0.6
## [61] evaluate_1.0.5 fastmap_1.2.0 ragg_1.2.4
## [64] yaml_2.3.10 knitr_1.50 bit64_4.6.0-1
## [67] fs_1.6.3 filehash_2.4-6 dendroextras_0.2.3
## [70] nat.utils_0.6.1 dendextend_1.17.1 nlme_3.1-160
## [73] xml2_1.3.6 compiler_4.2.1 rstudioapi_0.17.1
## [76] png_0.1-8 reprex_2.0.2 tweenr_2.0.2
## [79] bslib_0.6.1 stringi_1.8.3 RSpectra_0.16-1
## [82] lattice_0.20-45 vctrs_0.6.5 pillar_1.11.1
## [85] lifecycle_1.0.4 jquerylib_0.1.4 RcppAnnoy_0.0.20
## [88] data.table_1.16.2 seriation_1.4.0 R6_2.6.1
## [91] TSP_1.2-1 gridExtra_2.3 codetools_0.2-18
## [94] dichromat_2.0-0.1 MASS_7.3-58.1 assertthat_0.2.1
## [97] openssl_2.3.4 withr_3.0.2 mgcv_1.8-41
## [100] parallel_4.2.1 hms_1.1.3 grid_4.2.1
## [103] rmarkdown_2.30 S7_0.2.0 googledrive_2.1.1
## [106] ggforce_0.4.1 lubridate_1.8.0 base64enc_0.1-3